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Abstract. Many online, i.e., time-adaptive, inverse problems in signal processing and machine learning fall 
under the wide umbrella of the asymptotic minimization of a sequence of non-negative, convex, and continuous 
functions. To incorporate a-priori knowledge into the design, the asymptotic minimization task is usually 
constrained on a fixed closed convex set, which is dictated by the available a-priori information. To increase 
versatility towards the usage of the available information, the present manuscript extends the Adaptive Projected 
Subgradient Method (APSM) by introducing an algorithmic scheme which incorporates a-priori knowledge in the 
design via a sequence of strongly attracting quasi-nonexpansive mappings in a real Hilbert space. In such a way, 
the benefits offered to online learning tasks by the proposed method unfold in two ways: 1) the rich class of quasi- 
nonexpansive mappings provides a plethora of ways to cast a-priori knowledge, and 2) by introducing a sequence 
of such mappings, the proposed scheme is able to capture the time-varying nature of a-priori information. The 
convergence properties of the algorithm are studied, several special cases of the method with wide applicability 
^ (— I are shown, and the potential of the proposed scheme is demonstrated by considering an increasingly important, 

"p^ ' nowadays, online sparse system/signal recovery task. 
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1. Introduction 



| Many online, i.e., time-adaptive, inverse problems in signal processing and machine learning can be recast 

<N ; as follows [21[XnilS01IS2|ISSllSSllSnBQlSZll^SllSnilSSlISSllSSllSS] ; if the non-negative integer n € N denotes discrete 

^ ' time, having at our disposal a sequence of multidimensional data (a n ,d n ) n& n C M L x R, the objective of an 

' online learning method is to infer a possibly time-varying unknown mapping x* : W L — > R, which relates the 



previous data under the following model: 

dn = x*(a n ) + Cn, VneN. (1) 

In other words, at the n-th time instant, the L-dimensional input signal a n interacts with the signal/system 
which underlies x*, and our observation is the real valued d n which is contaminated by the additive noise £ n . 

Online learning methods show distinct differences from their batch counterparts due to the following fun- 
damental reason: batch optimization methods are mobilized after all the necessary data are available to the 
designer, whereas, in the online scenario, the sequential nature of the data (a n ,d n ) n( zfq dictates that at each 
time instant n, the newly arriving (a n , d n ) should be efficiently incorporated into the learning process, without 
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the need of solving the optimization task from scratch. Such a sequential mode is not prescribed only by the 
need for computational efficiency and savings. The online processing of data becomes an efficient tool also in 
cases of dynamic scenarios, where not only the probability density function of the input data (a n ) ng pj changes 
with time, but also where the unknown mapping x* shows a time-varying nature. In such time dependent 
environments, and in order to monitor the time variations of the underlying signals and systems, the designer 
is compelled to gradually disregard data which are associated to the remote past, and to put emphasis on 
recently received (a n ,d n ). It becomes clear that flexible and multifaceted online learning tools are needed 
in order to deal with fast emerging signal processing and machine learning applications, like sparsity-aware 
learning [21|lJJl|35l[30] , time-adaptive sensor networks pHEO], etc. 

The unknown mapping x* of (UJ) could be either linear or non-linear. Our assumption on the linearity or not 
of x* dictates the choice of possible spaces into which we perform our search for x*. If x* is assumed linear, 
then our working space becomes the classical Euclidean M. L [32|I47|. On the other hand, if x* is assumed non- 
linear, a mathematical sound way to model a fairly large amount of non-linear systems is to work in a possibly 
infinite dimensional Reproducing Kernel Hilbert Space (RKHS) [3]; a strategy which has been particularly 
successful in machine learning and pattern recognition tasks [TTl[30l[3Tl,HHl[S3l[H3lE3EH] - Since the Euclidean 
M. L is a renowned Hilbert space, and in order to offer a unifying framework for linear and non-linear systems, 
the stage of the following discussion will be based on a real Hilbert space %. 

Given an estimate x S T~i of the unknown x*, the most common way to validate x, with respect to the 
model ([1]), is to penalize the disagreement of the observed output d n with x(a n ), i.e., the real-valued difference 
x(a n ) — d n . A classical way to quantify such a perception of loss is to use the quadratic function in order to 
form the penalty (x{a n ) — d n ) 2 . The popularity of the quadratic loss function is based on its optimality in 
estimation tasks where the contaminating noise process (Cn)neN is Gaussian [31]. However, in order to establish 
a general framework for estimation problems, where the noise process is not constrained to be Gaussian, and 
in order to build estimators which show robustness to a wide variety of outliers, we give ourselves the freedom 
to employ any convex function C : R — > [0, oo), and not just the quadratic one, in order to quantify our 
perception of loss (see for example [49]). Having the data (a n ,d n ) as parameters in the design, the following 
function is naturally defined on the space H of our estimates: O n : H — > [0, oo) : x h-» C(x(a n ) — d n ). Due to 
the online nature of the problem, i.e., the sequential data (a n , d n ) ne pj, we end up in a sequence of loss functions 
(On)neN- We stress here that since £ can be any convex function, Q n is not bound to be differentiable. 

Theory, e.g., Bayesian inference [31], as well as everyday practice suggest that apart from the information 
included in the training sequence (o n , d n ) n£ N, estimation is enhanced if one employs also the a-priori knowledge 
about the unknown system x*. We will abide here by the set theoretic estimation approach [21] and quantify 
the a-priori knowledge as a closed convex set C in H. The first attempt to attack the task of online learning 
as the asymptotic minimization of a sequence (On)neN> over a nonempty closed convex set C, was given in 
[62,63], by means of the following simple iteration, called the Adaptive Projected Subgradient Method (APSM); 
for an arbitrary initial point uq G T~l, let 



where A n S (0,2), Pq stands for the metric projection mapping onto C, and Q' n (u n ) denotes any subgradient 
of O n at u n , Vn € N. The previous recursion is a time-adaptive generalization of the classical algorithm 
of Polyak [33], which deals with the minimization problem of a fixed, non-smooth, convex and continuous 
function O over C. Besides the new directions for online learning [58] . the previous recursion has offered also 




V?^ E N 





2 



a unification of several standard algorithms in classical adaptive filtering [32|,I4T| . Indeed, by letting % := M. L , 
for an appropriately chosen sequence (0 n ) nS N) and by substituting Pc with the identity mapping, the previous 
recursion [62 . 63 , 66j results in the classical Normalized Least Mean Squares (NLMS) [HE] an d the, vastly 
used nowadays, Affine Projection Algorithm (APA) [33"ll4*3]. 

It is often the case that a single closed convex set C, or even better, a single metric projection mapping 
Pc, cannot capture the diversity of the a-priori knowledge in signal processing applications. For example, 
in a robust beamforming problem [55], the a-priori knowledge is usually expressed as C = Hm=i ^ m > wnere 
{Cm}m=i i s a number of closed convex sets, with associated projection mappings {Pc m }m=l * na ^ are usually 
easy to compute. However, an analytic expression for Pq might not be available Secondly, erroneous 

a-priori information may result into an empty C = Dm=i = [SUEZ]. How is it possible to deal with 
multiple closed convex sets {C m }^ =1 where an analytical expression of Pc is not available, or the {C m }^f =1 
share an empty intersection? Avoiding the straightforward and recently popular solution of relaxing the 
original constraints, the study in provides with a solution to the previous problem and extends [U2JE3] by 
using a mapping T, in the place of Pc, which belongs to the general class of strongly attracting nonexpansive 
mappings. Indeed, the method [56] demonstrated its potential in a wide variety of online learning tasks, which 
span from classical linear adaptive filtering [67] to non-linear classification and regression tasks |58j . 

It is natural to ask now whether we can add more freedom to the usage of the a-priori knowledge. Our 
motivation is based on a couple of elementary observations. First, given the well-known fact that a nonempty 
closed convex set C is the set of all minimizers of the distance function d(-,C) to C, one of the ways to 
visualize a-priori knowledge could be the set of all minimizers of a generally non-smooth convex function 
defined on an appropriate Hilbert space T~L. Secondly, it is often the case in practice where a minimizer of 
a convex function cannot be reached either by an analytical formula or a computationally cheap process. A 
powerful mapping, whose recursive application is known to minimize a generally non-differentiable convex 
function, is the subgradient projection mapping [6, 7,64J. It is also known that this operator belongs to the 
class of quasi- nonexpansive mappings [51171164] , which strictly contains all the strongly attracting nonexpansive 
mappings, utilized in [56]. Now, the question arises naturally: does the APSM still operate when constrained 
by the general class of quasi-nonexpansive mappings, and can we, thus, devise a method with more freedom 
in incorporating a-priori information, than in the studies of [56 ] I62 ] [63]? Given the wide applicability of the 
APSM in online learning tasks [58], it is anticipated that such a generalization will add further flexibility to 
the APSM in order to tackle more challenging online learning tasks, which have been recently emerging both 
in signal processing and machine learning [2l [T7l[T9l[20ll35U40] . 

The present manuscript introduces an extension of the APSM [56 ]I62] [63] . towards a more flexible usage of 
the a-priori information, in two ways: 1) by considering a strictly larger class of mappings than in [56U62] [U5] . 
and in particular, operators taken from the rich family of quasi-nonexpansive mappings, and 2) by letting these 
mapping to be time-varying in order to capture the, quite often in signal processing and machine learning 
applications, dynamic nature of the a-priori information. Put in mathematical terms, the problem to be 
studied is the following. 

Problem 1 (Constrained asymptotic minimization task). Given a sequence of convex, continuous, and not 
necessarily differentiable functions (0„ : % — > [0, oo)) n£ N, and a sequence of strongly attracting quasi- 
nonexpansive mappings (T n : % — > %) n eN; with nonempty fixed point sets (Fix(T n )) ng N, we are looking 
for a sequence (u n ) nS M that asymptotically minimizes (0 n )neN over (Fix(T n )) nS N. Strictly speaking, our 
objective is to generate a (it n ) ng pj such that linin^oo Q n (u n ) = 0, and the set of its strong cluster points 
&((u n ) n£ n) lies in limsup^^ Fix(T n ), i.e., 6((u n ) neN ) C limsup^^ Fix(T n ). 
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Our algorithmic tool to tackle the previous optimization task is the following. 
Algorithm 1. Given an arbitrary initial point uq G T~L, generate the following sequence: 

Vn G N, u n+ i := < v ll°n( u ™)ll / (2) 

[T n (u n ), if & n (u n ) = 0, 

where A n G (0, 2) and Q' n (u n ) stands for any subgradient of G n at u n , Vn G N. 

The manuscript is organized as follows. A series of necessary definitions and facts are included in Section (2) 
The algorithm and its convergence analysis follow in Section [3l Special cases of the algorithm, with a wide 
application range in online learning, can be found in Section [H The potential of the method is shown in 
Section [5] by introducing a low-complexity time- adaptive learning technique for the increasingly important, 
nowadays, sparse system/signal recovery task. 

2. Preliminaries 

We start with several notations which will be frequently used in the sequel. 

The set of all non- negative integers, positive integers, and real numbers will be denoted by N, N*, and R, 
respectively. The set of all subsequences of N will be denoted by N5>, i.e., NSo := {^V C N : N is infinite} 
|46j . Any iV G NSo can be also denoted by the standard way of N = (n^keN- Define, also, Nqo := {N C N : 
N \ N is finite} [56]. In other words, Nqo contains all the "neighborhoods of oo", with respect to N, while 
is its associated "grill" [16] . 

Henceforth, the symbol H will stand for a real Hilbert space, equipped with an inner product (•, •), and a 
norm ||-|| := y/ (•, •). In the case where W becomes the Euclidean M. L , L G N*, any element of R L will be denoted 
by boldfaced symbols. The inner product of M. L will be the classical vector dot product, i.e., (v\, i^} := v\v2, 
Vui,i>2 G R^, where the superscript t stands for vector/matrix transposition. 

Given an x G H and a p > 0, an open ball is defined as the set B(x, p) := {v G % : ||x — v\\ < p}, while a 
closed ball B[x, p] := {v G H : \\x — v\\ < p}. Given S, T C 7i, the relative interior of S with respect to T is 
defined as rix S := {v G 5 : Bp > 0, ^ (B(ii, p) n T) C 5}. The interior of 5 is defined as int S := ri^ S. 

Given S C H, define the distance function to S as follows: d(-,S) : % — > [0, oo) : x i-> d(x,S) : = 
inf{||x — v || : v G S}. Given any nonempty closed convex set C C T~L, the (metric) projection onto C is defined 
as the mapping Pq : % — > C which maps to an x G % the (unique) Pc{ x ) G C such that \\x — Pc(%) \\ = d(x, C). 

Definition 2 (Subdifferential and subgradient). Given a convex function O : % — > R, the sub differential of 
is defined as the set-valued mapping: 

dG : n -> 2 H : x h-> ae(x) :={«£M: Vy G W, («, y - x) + G(x) < 9(y)}. 

In the case where G is continuous at x, then 9G(x) 7^ |29j . Any element in 5Q(x) will be called a subgradient 
of G at and will be denoted by Q'(x). If G is Gateaux differentiable at x, then d@(x) becomes a singleton, 
and the unique element of dQ(x) is nothing but the classical Gateaux differential of Q at x. Notice, also, the 
well-known fact: G dQ(x) «i£ argmin ve ^ Q(v). 

Example 3. The subdifferential of the metric distance function to a closed convex set C C H is given as 
follows: 

JVb(x)nJB[0,l], ifxGC, 



dd(x,C) 



if x € n \ c, 



d(x,C) 

where N c {x) := {v G H : Vy G C, x> < 0}. Notice that Vx G H, W(x,C) G dd(x,C), \\d'(x,C)\\ < 1. 
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Definition 4 ([SHUE])- Given a mapping T : % — > Ti, the set of all fixed points of T, i.e., Fix(T) := {v & 
T~L : T(v) = v}, is called the fixed point set of T. Assume a T : "H — » % such that Fix(T) 7^ 0. The mapping 
T will be called quasi-nonexpansive if Vx € Vt> € Fix(T), ||T(x) — u|| < ||x — v\\. It can be verified that the 
fixed point set of a quasi-nonexpansive mapping is closed and convex, e.g., [6j Prop. 2.3 and 2.6]. If 



then T will be called r]-attracting or strongly attracting quasi-nonexpansive. 

Now, if Vx, y G H, \\T(x) — T(y)\\ < \\x — y\\, then T will be called nonexpansive. In the case where T 
is both nonexpansive and strongly attracting quasi-nonexpansive, then it will be called strongly attracting 
nonexpansive. 

In particular, an 1-attracting (quasi)-nonexpansive mapping will be called firmly (quasi) -nonexpansive. 

Fact 5 (Equivalent description of strongly attracting quasi-nonexpansive mappings [6CHI64]). The following 
statements are equivalent for a mapping T : H — >• %■ 

1. T is ^-attracting quasi-nonexpansive. 

2. T is yq^-averaged quasi-nonexpansive. A mapping T is called a-averaged quasi-nonexpansive, with a € 
(0, 1), if there exists a quasi-nonexpansive mapping R : H H such that T = (1 — a)/ + aR. 

In particular, T is firmly quasi-nonexpansive iff T is ^-averaged quasi-nonexpansive. Notice that Vq € (0, 1), 
Fix(T) = Fix(i2), which suggests that given a quasi-nonexpansive mapping R, we can always construct a 
strongly attracting quasi-nonexpansive T that shares the same fixed point set with R. 

Example 6 (Subgradient projection mapping). Given a convex continuous function 0, such that lev<o 6 := 
{v € % : @(v) < 0} ^ 0, define the subgradient projection mapping T® : % — > % with respect to G as follows: 



will be called the relaxed subgradient projection mapping with respect to Q. It can be verified that VA € 
(0,2), Fix(r4 A) ) = Fix(T e ) = lev< Q [6]. Moreover, VA € (0,2), the mapping is ^-attracting quasi- 
nonexpansive [6]. 

Example 7 (Relaxed metric projection mapping). Let a nonempty closed convex set C C H and its associated 
metric projection mapping Pq. Then, the relaxed (metric) projection mapping, Tq := I + a(Pc — I), 
a £ (0,2), is ^^-attracting nonexpansive with fixed point set Fix(T^) = C [5]. 

Example 8 ( [5|l63] ). Let T\,Ti be r\\- and ^-attracting (quasi)-nonexpansive mappings, respectively. Assume 
also that Fix(7\) n Fix(T2) 7^ 0. Then, the mapping T1T2 is Vl ^ 2 — attracting (quasi)-nonexpansive, and 



Definition 9 (Demiclosed mapping at 0). A mapping T : T~i — > % will be called demiclosed at if the following 
property holds; for a sequence (x n )neN C H, and an x* € %, 



3n > : Vx € H, Vi> G Fix(T), n ||x — T(x) || 2 < ||x — v|| 2 — ||T(x) — d 




where & 



(x) is any subgradient in d@(x). If I stands for the identity mapping in 7i, the mapping 

4 A) := I + X(Tq - I), A €(0,2), 



Fix(T 1 T 2 ) = Fix(Ti) n Fix(T 2 ). 




then T(x#) = 0, 



where the symbols — 1 and — > denote weak and strong convergence in 1-1, respectively. 
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Example 10 ([MJ Lem. 2]). If T : % — > T~i is a nonexpansive mapping, then / — T is demiclosed at 0. 

Example 11 ([6j Prop. 6.10], |60|). Let a continuous convex function : % — )• K such that lev<o 6 7^ 0. 
Then, VA € (0,2), the mapping 7 — Tq^ is demiclosed at 0, where stands for the relaxed subgradient 
projection mapping with respect to 0. 

Fact 12 ([63]). Assume a sequence (sn)nsN C 7^, and a closed convex set C C Assume that 

3k > : Vi> G C, Vn G N, k ||#n+i — x n \\ 2 < \\x n — v\\ 2 — \\x n+ i — u|| 2 . 

If there exists, also, a hyperplane II such that rin C ^ 0, then 3x* G % such that x* = lim n _ >00 x n . 

Definition 13 (Inner and outer limits [UHS]). Given a sequence of subsets {S n ) n ^ C 7i, define the inner 
and outer limits: 



lim inf S„ 



x G H : 3N G Nqo, 3x n G S n ,^n G N, such that lim x n = x 



< x 6 % : lim sup d(x, S n ) = > = f| M S n 

7VeN# "^TV 



n 

e>0 



,n=l fc=n 



(3) 
(4) 



lim sup := < a; G % : 3iV € N* , 3x n E S n ,Vn G N, such that lim x n = x > 

n->oc [ n£N J 

G ft : lim inf d(x, S n ) = o} = Q (J 5 n 



n 

£>0 



f| \J(S k + B[0,e]) 



,n=l k=n 



where Sk + B[0, e] := {s + 6 : s G 6 G -B[0, e]}, and the overline symbol stands for the closure of a set. In 
a similar fashion, given a sequence of subsets (SVOneN, and a subsequence A = (rik)k£N €= ^oo, the notation 
liminf ne jv <5 n is defined as liminffe_ > . (X3 S nk . Likewise, limsup ngA r Sn := nmsu Pfc->oo S nh . 



3. The Analysis of the Algorithm 

3.1. A useful theorem. Prior to the analysis of Algorithm [H we state and prove Theorem 1151 which will be 
repeatedly used in the sequel. The proof of Theorem [TBI will be based on the following assumption. 

Assumption 14. Assume a sequence of mappings (T n : % — s- H) n eN with nonempty fixed point sets 
(Fix(T n )) nS N- For any subsequence N G Ngo, for any sequence (x n ) n€ N C T~L, and for any 7 > such 
that Vn G N, d(x n ,Fix(T n )) > 7, there exists a 5 > such that liminf nG Ar ||(I — T n )(x n )\\ > 5. 

Theorem 15. Assume a sequence of mappings (T n : % — > 7i) n ^, with nonempty fixed point sets (Fix(T n )) ne pj, 
such that Assumption 1141 is satisfied. 

1. Assume a subsequence N G Ngo, a sequence (x n )neiv C % and an x* G ft. 

If ( ' „^» r then x* G lim inf Fix(T n ). 
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2. Let @((x n ) ne pj) be the set of all strong cluster points of a sequence (x n ) n 



& 

I &{{Xn)nm) + 0, , es.ll \ \ r- V V IT \ 

If < then 6((x n ) neN ) C hmsupFix(T n ). 

(I-T n )(x n ) ^^0, 

Proof. 1. We will prove Theorem 1 1 5 1 1 1 by contradiction, i.e., assume that x* ^ liminf ng 7v Fix(T n ). 

By ©, hmsup n6JV d(x*,Fix(T n )) > 0, i.e., there exists r > 0, and 3N 1 G N* , such that Vn G N 1 n iV, 
d(x#, Fix(T n )) > t. 

Moreover, since lirm^oo x n = x*, there exists an Nq G Nqo such that Vn G iVo n iV, \\x* — x n \\ < ^. 
Having these in mind, the triangle inequality ||x* — v\\ < ||x* — x n \\ + \\x n — v\\, \/v G Fix(T n ), leads us to 
the following: 

Vn G N flN'nN, d(x n , Fix(T n )) > d(x*,Fix(T n )) - ||x* - x n || > r - ^ =: 7 > 0. 

Hence, there exists a subsequence N" := Nq (1 N' Pi N <E N* such that Vn G iV", d(x n , Fix(T n )) > 7. 
Now, by Assumption (HJ there exists a (5 > such that 

0<5<liminf \\(I-T n )(x n )\\ = lim - T n ){x n )\\ = 0, 

nGiV" nGAf" 

where the last two equalities come from the fact that N" C N . This contradiction establishes Theorem 1 151 11 
2. Choose arbitrarily an x* G 6((x n ) ng ^). By definition, there exists a subsequence TV G NSo such that 
lim nG Arx n = x*. Hence, by Theorem I15ll[ x* G liminf ng Ar Fix(T n ). By Definition [T3l 3Nq G Nqo and 
3x' n G Fix(T n ), Vn G N n iV"o such that lim ne Ar n Ar x'„ = x*. 

Clearly, N' := NnN £ N* . In other words, 37V' G N* , 3x'„ G Fix(T n ), Vn G iV such that lim neN > x' n = 
x*, i.e., x* G limsup n ^ 00 Fix(T n ) by Definition [T3l Since x* was chosen arbitrarily, Theorem 115121 is 
established. □ 

Next is an example of a sequence of mappings which satisfies Assumption [T4"l and which will be used later 
on in the sequel. Another example of a family of mappings which satisfies Assumption 1141 and which relates 
to the minimization of an ^i-norm loss function, will be seen in Lemma 126141 

Example 16. Assume a sequence of nonempty closed convex sets (S n ) n <=jq, the associated sequence of relaxed 
metric projection mappings 

T^ n) :=I + a n {P Sn - 1), a n G (0,2),Vn G N, 

and the existence of a sufficiently small e > such that a n G [e, 2), Vn G N. Then, the sequence of mappings 
(Tg," n ^) ne N satisfies Assumption [HJ 

Proof. First of all, by Example Vn G N, Fix(T^ n) ) = S n . Choose, now, arbitrarily an N G , a sequence 
(x n ) n( =7v C H, and a 7 > 0, such that Vn G N, d(x n , Fix(Tg"™^)) = d(x n , S n ) > 7. Then, it is easy to verify by 
the definition of T^ n) that 



Vn G N, 



(I - T^ n) )(x n ) = a n d(x n , S n ) > e 7 > 0. 



Therefore, there exists a 5 > such that liminf nG 



N 



(I-T^)(x 



> 5, and Assumption [T4l is established. 

□ 
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3.2. The Main Analysis. Given a sequence of convex, continuous, and not necessarily differentiable functions 
(0 n : % — > [0, oo)) nS N) and a sequence of ^-attracting quasi-nonexpansive mappings (T n : % — > 1-L) n< =n, with 
rj n > 0, Vn G N, and with nonempty fixed point sets (Fix(T n )) nS N; the convergence analysis of Algorithm El 
given in Theorem 1 18[ will be based on the following series of assumptions. 

Assumption 17. 

1. There exists an N G Nqo such that Vn G N, Cl n := Fix(T n ) n lev<o Q n 7^ 0- 

2. There exists an iV G such that := f) n( z N ft n 0- 

3. Choose an e G (0, 1], and let Vn G N, A n G [e, 2 - e]. 

4. The sequence (Q' n (u n )) n ^n is bounded. 

5. Define f] := inf{?7 n : n G N}, fj := sup{ry n : n G N}. Then, assume that f\ > and fj < oo. 

6. The sequence of relaxed subgradient projection mappings (Tq ) ne N satisfies Assumption [T4l 

7. The sequence of mappings (T n ) ng pj satisfies Assumption [T4"l 

8. Assume that Vn G N, T n := T, where T is a strongly attracting quasi-nonexpansive mapping with Fix(T) ^ 
0, and I — T is demiclosed at 0. 

9. The set &((u n ) n ^) of all strong cluster points of the sequence (^in)neN is nonempty. 
10. There exists a hyperplane II such that rin(fi) ^ 0- 

Theorem 18 (Properties of Algorithm [1]) . 

1. Let Assumption 1 1 71 1 1 hold true. Then, Vn G N, d(u n +i,Q n ) < d(u n ,VL n ). 

2. Let Assumption 117121 hold true. Then, Vn G N, d(u n +i,Q) < d(u n ,Q). 

3. Let Assumption 117121 hold true. Then, \/v G ft, the sequence (||u n — «||)neN converges. 

4. Let Assumption 1 1 7121 hold true. Then, the set of all weakly sequential cluster points of the sequence (u n ) n eN 
is nonempty, i.e., 2B((u n ) nG N) ^ 0- 

5. Let Assumptions 117121 and [T7I31 hold true. Then, 



lim 

n— too 



(i-r^)K 



@n(u n ) 

lim — H| = 0. 



n— >oo 



where, in order to avoid ambiguities, we let g := 0. 

6. Let Assumptions 117121 117131 and 117141 hold true. Then, lim n ^oo Q n (u n ) = 0. 

7. Let Assumptions 11712} 117131 117161 and 117191 hold true. Then, &((u n ) n ^) C limsup n _ ) . 00 lev<o O n . If, in 
addition, the set ©((it n )neN) is a singleton, i.e., there exists a such that {u*} = &((u n ) n ^), then, 
u* G liminf^oolev^oGn. 

8. Let Assumptions [T7l2l and [T7I5I hold true. Then, lim n __ i . 00 (/ — T n )(TQ n \u n )) = 0. 

9. Let Assumptions [T7I21 [T7l3l 117151 [17171 and [HE] hold true. Then, 6((-u n ) n6N ) C limsup^^ Fix(T n ). If, 
in addition, the set &((u n ) ne ^) is a singleton, i.e., there exists a it* such that {u*} = 6((u n ) ne psj), then, 

G liminf n _^ 00 Fix(T n ). 

10. Let Assumptions EM CLM and EM hold true. Then, 2U((u n ) neN ) C Fix(T). 

11. Let Assumptions EM EM EM and EM hold true. Then, 6((u n ) neN ) C Fix(T). 

12. Let AssumptionsEMEMEM andEMUhold true. Then, 3u* G H : lim e., 6((n n ) neN ) = 
{-"*}• 

Proof. 1. By assumption 117111 Vn G iV, lev<o n ^ 0- Recall also the fundamental fact that G dQ n (u n ) <3> 
u n G argmin^ gW @ n (v). 
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Fix any n G N. Consider the case where u n £ lev<o O n 6 n (u n ) > =>■ 0^(it n ) / 0. Then, by ([2]), 
T„. [«„, - An, „ ®"!* Un )„5 @' n (u n ) ). Now, assume that u n G lev< On 44> ©n(^n) = 0. If Q' n (u n ) = 0, 



«n+l - x n ya n ^|| 0Uun) ||^nV 

then by ([2]), u n+ i = T n (u n ). On the other hand, if @' n (u n ) ^ 0, then, again, u n+ \ = T n (u n ), since 
6 n (tt„) = 0. To summarize, ([2]) takes the following form: 



Vn G iV, u„. 



+1 



V„ (u n - \n ^rj2)t &niUn) ) ' if Un $ lev -° 0n ' 

T n (u n ), if ii n G lev< n . 



If we combine this result with Example [6l then it can be easily verified that the previous recursion can be 
equivalently viewed as follows: Vn G N, u n+ \ = T n TQ n \u n ), where Tq^ stands for the relaxed subgradient 
projection mapping w.r.t. Q n . 



Now, since T^ Xn) is a M« 



-attracting quasi- nonexpansive mapping, with Fix(T ( 



be easily verified by Example [8] that the mapping T n T^ n ^ is 



(2-\ n )r) n 
2-A n (l-7?„) 



(Anb 
6n J 



lev<o n , it can 



-attracting quasi-nonexpansive, with 

Fix(T n r4 A n n) ) = Fix(T n ) n Fb^T^) = Fix(T n ) n lev< 9 n = U n , Vn G N. Hence, by Definition H we have 
that Vn G N, Mv G Q n , 



< 



(2 - A n )r? n 
2 - A n (l - r] n 



Un+l\ 



(2 - A n )?7 n 
- A n (l - rj n ) 



a, 



< Mr 



T n T^\u n )-v 



\U r , 



\U n +l ~ V\ 



\u n+ i - v\\ < \\u n - v\ 



(5) 
(6) 



If we apply ini v ^Q n , on both sides of ©, then we obtain Theorem 118111 
2. Due to Assumption I17I2| to the fact that Cl is closed and convex, to Pn(u n ) G ft C Q n , Vn G N, and to 
we have: 

Vn G N, d(un,Q) = \\u n - Pu(u n )\\ > \\u n+ i - Pn(ttn)|| 
> \\u n+ i - Pn(u n+ i)\\ =d(u n +i,Q), 



(7) 



which is nothing but Theorem 118121 

3. Fix arbitrarily v E CI. By ([6]), the sequence (\\u n — v\\) n ^N is non-increasing and bounded; hence convergent. 
This establishes Theorem 118131 

4. Since (it n )neN is bounded by Theorem ll8l3( W((u n ) ne ^) ^ [271 Thm. 9.12]. This establishes Theorem ll8l41 

5. There is no loss of generality if we assume that Vn G N, Q' n (u n ) ^ 0. To see this, notice that for all n G N 
such that Q' n (u n ) = 0, we obtain Q n (u n ) = => ne' 1 !"")!! = § := 0. Hence, in such a case, the claim of 
Theorem 118151 holds true. 

Assume, now, any v G fl. Recall also that the mapping T n is quasi-nonexpansive, with Cl C Fix(T n ), 
Vn G N, and easily verify Vn G N, Vt> G Cl, 



\u n +i ~ v\ 



Tji yUn A n - 

(u n - v) - A r 

l|2 i \ 
U n — V\\ +A 



&n(Un) 



2 ®n(^n 



< 



•lh. 



A, 



10'nK)!! 



r©n(«n) 



lej.MII' 
n H@^K)|| 



!@n(«n) 



2A r 



l©;K)||' 



(u n - v,@' n (u n )) . 



(8) 
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By the definition of the subgradient, we have that (v — u n ,Q' n (u n )} + Q n (u n ) < @ n (v) = 0. If we merge 
this into ©, we obtain the following: 



\u n+1 -v\\ <\\Un-v\\ +A n — — 

\\®n( u n) 



2A n - 



@nK) 



This implies in turn that 

VnGiV,VwGft, 0< 



©;K)ir 

6 n (n n ) ^ A n (2 - A n ) e 2 n {u n ) 



\Un — v\ 



An(2 — X n ) 



l@'>n)|| 



2 ■ 



i©;K)ir 



< 



i©;K)ir 



< 



\U n +l ~ V\ 



However, by Theorem 118131 the sequence (||u 
of a Cauchy sequence implies that lim„_ 
inequality establish lin^^oo ||Qr|" n j|j = 0. 
Now, notice that for all n G N: 



n — v\\ )neN is convergent, and hence Cauchy. The definition 
oo(||tt n — v\\ 2 — ||it n +i — v\\ 2 ) = 0. This fact and the previous 



u n -T^\u n ) 



, <3>n(u n ) , n &n(Un) 



l©nK)|| ~ \\&n(Un)W 

Take lim^^oo on both sides of this inequality, and recall the previous result to easily verify that 

0. 



lim 

n— >oo 



Tk Xn) (u n ) 



In other words, Theorem 118151 holds true. 
6. Since the sequence (6 n (tt n )) n£ N is assumed bounded, there exists a D > such that Vn G N, ||0 n (u n ) || < D. 
Notice, now, that for all those n G N such that @' n (u n ) ^ 0, we have 

® n (u n ) . „ @n{Un) 



&n(u n ) = \\Q' n (u n ) 



\@'n( u n) 



< D- 



(9) 



i©;K)ir 

Moreover, for all those n G N such that & n {u n ) = 0, it is clear by the well-known fact G d@ n (u n ) 4=> 
u n G argmin^ eW Q n (v), that @ n (u n ) = 0. If we take linin^^ on both sides of ©, and if we also recall 
Theorem 118151 the claim is established. 

7. Notice that Vn G N, Fix(T^"') = lev<o© n - Hence, <5((ii n )neN) C lim sup n ^, 00 lev<o n is a direct conse- 
quence of Theorems 1 1 5 1 and 1 1 8 1 5 1 The claim for the case of &((u n ) n ^) = can be easily obtained if we 
let N := N in Theorem USUI 

8. Here we will use Definition [5] two times; one for the mapping T n , and one for Tq"\ In other words, Vn G iV, 



Vv G Q, 



(i-T n )(4 x ;\u n )) 



T^\u n )-T n T^\u n ) 

2 



< 



4 x ;\ Ur , 



< \\u n - v\ 



< Vn 



4 X ;\u n )-T n T^\u n ) 



v 



-i(A„) , 



T n TX -(u r . 



2 



\U n +l ~ V\ 



U 



-^8 



(An), 



K+i - v\ 



< \\Ur, 



\U n +l ~ V\ 



Divide the above inequality by f] > 0, recall Theorem 11813} and take bim^oo on both sides of the resulting 
inequality to obtain \\Ya. ri ^ tCO (I - T n )(T^\u n )) = 0. This establishes Theorem [EM 
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9. First, since 6((u n ) n& ^) ^ 0, notice that (3((7@ ™^(tt re )) n6 N) = 6((u n ) ne fq). To establish, for example, 
< 3(('"n)neN) C &((T^ in \u n )) n gn), choose arbitrarily a u* G &((u n ) n ^), which implies that there exists a 
subsequence N' G N* such that lim ng jv' u n = u*. Then, it is easy to verify that 



Vn G N', 



< Ilit* - u n \\ + 



(I-Tt ] )(u n ) 



Take lim ng 7V' on both sides of the previous inequality, so that the following result is obtained by Theo- 
rem 118151 G &((TQ n \u n )) n ^). Similar arguments can be used in order to derive &((T^ n \u n )) n ^) C 

Now, it becomes clear under the previous discussion, that if we define x n := Tg^ (u n ), Vn G N, in 
Theorem [T51 then Theorem 118191 becomes a direct consequence of Theorems [TS] and 118181 

10. Theorem 118141 guarantees that 2XT((n n ) ng N) ^ 0. Fix arbitrarily a u* G 2U( (u n ) n ^). By definition, there 
exists a subsequence N G NSo such that u n it*. 

Recall Theorem 118151 and easily verify that u n — TQ n \u n ) neN > 0. This together with u n n£N s 
imply that 

4 A ;Vn)^«*. (io) 

Recall, now, Theorem USE] in order to obtain (I - T)(T^\u n )) 0. This result, ([TDD, and Defini- 

tion [9] lead us to (/ — T)(ii*) = 44> n* G Fix(T). This establishes Theorem 1181101 

11. This is a direct consequence of Theorem 1 181 101 and the well-known fact that &((u n ) ne n) C 2U((u n ) ng fsj). 

12. It is easy to verify by Assumptions 117131 and [T7I51 that 

(2 - X n )r] n _ (2 - X n )r] n > ei) Q 



2-A n (l-7? n ) (2 - A n ) + A n r/ n " 2(1 + 77) 
Using also ([5]), we easily verify under Assumption 117121 that 

Vn G N,Vv G O, — ^— ||u n - -u n+ i|| 2 < ||n n - u|| 2 - ||n n+ i - u|| 2 . (11) 
2(1 + 77) 

The claim of Theorem 1181121 is a direct consequence of (jllj) , Assumption 1171101 and Fact [12] □ 

4. Special Cases of the General Algorithm 

4.1. Exploring (T n ) ng pj. The available a-priori information about the model (pQ) enters Algorithm [1] through 
the sequence of mappings (T n ) ng pj, i.e., implicitly via the sequence of sets (Fix(T n )) ng N- Given that n G N 
stands for time, the sequence (T n ) ng N aims to capture the dynamic nature of a-priori information, which 
is usually met in signal processing and machine learning applications. For example, it is often the case 
in adaptive signal processing to face a channel whose impulse response changes slowly with time. Notice 
also here that the sequence (T n ) ng N belongs to the rich family of strongly attracting quasi- nonexpansive 
mappings. To demonstrate the versatility offered by this class of mappings in the usage of the available a- 
priori knowledge, examples of such mappings, mobilized extensively in various contexts of optimization theory 
[7], are demonstrated in this section. More specifically, in order to apply the proposed scheme to a real- world 
problem, the following Example [23] considers a non-smooth loss function which infuses sparsity information in 
([!]) . Such a loss function will be incorporated in Algorithm [29] to devise an algorithmic solution to the online 
sparse system/signal recovery task of Section [5] 

Example 19 (Resolvent). For a set-valued mapping A : T~L —> 2^, its graph is defined as the set gph(^4) := 
{(x,y) G H x H : y G A(x)}. The mapping A will be called monotone if V(xi, yi), (X2, 2/2) £ &ph(A), 
(xi — X2,yi — 2/2) > [T[[8[ 1381 146]. A monotone mapping A will be called maximal if no enlargement of 
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its graph is possible without destroying monotonicity, i.e., V(x,y) G % x % \ gph(^4), there exists a pair 
(^0)2/o) £ gph(^4) such that (x — XQ,y — yo) < [3|8l|38lll6] • For example, the linear mapping induced by any 
positive semi-definite matrix is maximal monotone |46t Examples 12.2 and 12.7]. 

Now, given a maximal monotone mapping A : H ->■ 2 n , and a £ > 0, its resolvent := (I+^A)' 1 :U^U 
is an 1-attracting nonexpansive mapping, where stands for the inverse of a mapping. The fixed point set 
of T^' becomes Fix(T^)) = {x G % : G A(x)}. For example, in the case of a positive semi-definite matrix, 
this fixed point set is nothing but the null space of the matrix. 

Example 20 (Proximity mapping). Given a lower semi-continuous function : T~L — > R, the Moreau envelope 
of index 7 > of <3? is the function 



Then, the proximity mapping T 7 $ is defined as the mapping which maps to an 1 G H the unique minimizer 
of (|12p [23 , 24, 39j . It can be verified that the proximity mapping T 7 <j> is 1-attracting nonexpansive with fixed 
point set Fix(T 7 $) = {x G U : $(x) = mf yen $(y)} [23p4"]. 

Example 21 (Inconsistent a-priori information). Assume that the available a-priori knowledge about our 
system is a gathering of several pieces of information which take the form of the following nonempty closed 
convex sets: T, {C m }£f =1 in U, with M G N*. With V we denote the information that our system should 
surely satisfy, called the absolute or hard constraint. Ideally, our solution set is T n (0^=1^™)- However, 
it is quite often the case that the available pieces of a-priori knowledge are inconsistent, i.e., the previous 
intersection is the empty set, e.g., [55]. To tackle such a problem, we define the following proximity function: 
Mx G H, p(x) := Y^m=i f3m,d 2 (x, C m ), where {/3 m }^f =1 are convex weights, i.e., {/3 m }^f =1 C (0,1], such that 
X^m=i Pm = 1- The proximity function is everywhere Frechet differentiable, and its differential is the mapping 
p' := 2^^ =1 /3 m (I — Pc m ) '■ H —> W- Define, now, as our new solution set S := arg min{p(x) : x € T}. The 
non-emptiness of S is guaranteed if at least one of {C m }^f =1 or T is bounded [UJ. In words, E is the set of all 
those points in T that least violate, in the sense of the previous proximity function, the rest of the constraints 
{C m }m=i- Under the previous setting, and VA G (0,2), the mapping T p := Pr(I — Ap'), is (1 — ^-attracting 
nonexpansive with fixed point set Fix(T p ) = 3 ^8 | [22 |[6"iy651 [67] . 

Example 22 (The class % of mappings [6]). For any x,y € H, define the following set: H(x,y) := {v € rl : 
(x — y, v — y) < 0}. In words, the set H (x, y) is the closed halfspace onto which y is the metric projection of x. 
Now, a mapping T : H — > % is said to belong to the class T of mappings, if Vx G H, Fix(T) C H (x, T(x)) [6]. 
An equivalent description of the class T is as follows: T € X iff T is firmly quasi-nonexpansive [U Proposition 
2.3]. Moreover, VT G T, Fix(T) = f) xe u H(x, T(x)). For example, the subgradient projection mapping Tq 
(Example [6]) belongs to this class [6l Proposition 2.3]. 

Definition 23 (Sparsity-aware loss function). Henceforth, the notation ji,j2, for any integers ji < j'2, will 
stand for + l,...,j2}. Assume that T~i := R L , for some L G N*. We introduce, here, the following 

sequence of convex, continuous, non-negative functions (<F„ : % — > [0, oo)) nG ^. Given a sequence of weight 
vectors (w n ) n ^ C M L , with positive components, i.e., w n j > 0, Mj G 1,L, Vn G N, and a positive parameter 
p > 0, we define 




(12) 



L 




L 




(13) 
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It is clear that the 0-th level set for each <3? n is a weighted ^i-ball, i.e., 

L 

Vn G N, lev< $ n = B tl [w n , p] := {x G R L : ^ w nJ \xj \ < p}. 

i=i 

The fixed point set of the relaxed subgradient projection mapping T^ n \ v n G (0,2), is the weighted l\- 
ball, i.e., Fix(T^, ) = B^[w n ,p]. The sequence Bc 1 [w n ,p] has been very useful in building sparsity-aware 



online learning methods in [35j[5lJ[52] ■ There, the metric projection mapping Pb ( [w n ,p] was employed, whose 
computation scales to the order of C(Llog 2 L). 

Following a different path than |35 t l51| . [52] . the information carried by (B^ [w n , p])n&N is viewed from an 
alternative angle in this study: Vn G N, B^Wn, p] is not just a closed convex set, onto which we project, 
but it is also the set of minimizers of the non-smooth loss function <!>„. In order to minimize the non- 
smooth $ n , the subgradient information will be used. However, the employment of such an information is 
not possible via [56HS2M63], since the subgradient projection mapping (Definition [6]) belongs to the class of 
strongly attracting quasi-nonexpansive mappings, which is strictly larger than the class of strongly attracting 
nonexpansive operators, utilized in 



The set B^[w n ,p\ is a closed convex set, and its metric projection mapping is given as follows. To save 
space, we give here a short description. For the full discussion, the interested reader can refer to [35]. 

Fact 24 (Metric projection mapping onto the weighted ^i-ball [35])- Given x G M. L \ B^ x [w n , p], there exists 
an /* G 1, L, and a set of integers {/j} J -g ^ +1 L C I* + 1, L, such that the metric projection PB e [w n ,p]( x ) 1S given 
by a permutation on the components of the following vector 



£i*=l W r. 



i=i W L 



■sga(xi)w n>1 ,...,xi. 



Ylj=lWnA x i\ ~ P 

Eh 2 
i=l K,i 



sgn(£cjjio n i„0, . . . ,0 



(14) 



where 



Vj G l,U 



Without any loss of generality, we assume that PB ei [w n ,p]( x ) is given by (|14p in the sequel. 

Regarding Definition 1231 consider the following assumptions. 
Assumption 25. 



1. The sequence of weight vectors (w n ) n< z?q is constructed such that Vn G N, Vj G 1,L, w n j G [e, e], for some 
e,e > 0. 

2. Given the sequence of relaxed subgradient projection mappings (T^"^) ne N, with respect to the sequence 
(<£ n ) ne N in Definition [231 there exists e' G (0, 1] such that Vn G N, v n G [e',2 — e'\. 



Lemma 26. The following properties hold true. 

1. The subdifferentials of the loss functions ($ n )neN, defined in ([i~3|) . are given in Table CD 

2. Let Assumption 1251 II hold true. Then, Vx G M L , ($' n (x)) n( z^ is bounded. 

3. Let Assumption 1251 ll hold true. Then, int(P| ngN B^ [w n , p]) ^ 0. 

4. Let Assumptions [25TT1 and [25121 hold true. Then, the sequence of relaxed subgradient projection mappings 
(T^ n ^) ne f$ satisfies Assumption [T41 
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X 




d$ n (x) 


YTj=l w n,j\ X i\ < Pi 




{0}. 


Y!}=i w n,j \xj\ > p, 


Zx = 0, 


1 




" V) n> \ sgn(xi) " 






YTj=lWn,j\xj\ > p, 


Zx + 0, 


COnV {lL\, . . . , U2t}, 

where the vectors Uk, V/c 6 1, 2 r , are given by 

J w n j sgn(xj), ifj<£Z x , 
u k ,j ■■= < 


J2j =1 Wnj\xj\ =p, 


Zx = 0, 


conv < 


[• 


" iu n ,i sgn(xi) " 
_«i„ iL sgn(x L )_ 






Y%=lWn,j\Xj\ = P, 


Zx + 0, 


conv{0, Mi, ... , it2 T }- 



Table 1. Here, Zx '■= {j E 1,L : x j = 0}, and r stands for the cardinality of Zx, whenever 
Zx 7^ 0- The conv symbol stands for the convex hull of a set. 



Proof. 1. To save space, the calculation of the subdifferentials in Table [T] is omitted. These results can be 
reproduced by using standard arguments of convex analysis, e.g., [m Thm. 25.6]. 

2. Lemma [26121 can be easily established by Assumption 125111 and Table [TJ 

3. Choose any x € B(0,£~). Then, Vj € 1,L, \xj\ < -jk. Moreover, Y^j=i w n,j\ x j\ — Ylf=i^Tl = P- Hence, 
B(0,-fj) C Bi^Wnjp], Vn € N. This clearly suggests that E int(P) ngN B^Wn, p\), which establishes 
Lemma 126131 

4. First, notice that Vn € N, Fix(Tj > iy ™^) = B^\w n , p]. Now, according to Assumption 1141 fi x arbitrarily a 
subsequence TV E NSo, a sequence (x n ) n€ N C R , and a 7 > such that Vn € iV, d(a; n , Bi x [w n , p])) > 7. 
Notice by Fact |24"1 the following: Vn € iV, 

2 



7 < ^(aJn.B^ItOn.p]) 

(Si=l w n,i\X n . 
= / \ \ 2 



-Pb^j [t« n ,p] (-^n) 



- u;; 



1,3 T Z-/ n >3 

j=U+l 



I* 



r- 



'•=1 Ei*=i<i 



2 ( E/=l ^n.il^n,' 



< 

which, in turn, results into 
/ L 



X^ L I 



L 
3=1 



EL 2 
3=1 W n,3 



Le 2 



: <5' 2 > 0, Vn G iV. 
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Notice, also, by Example [6] and Lemma 126121 that Vn G N, 



{I-Tt ] )(Xn 



5' 

>e->0, 



which clearly suggests that 35 > such that lim inf nS 7v 



(I-Tt ] )(x 



<£>„ 



> 5. This establishes Lemma [26l4[ 

□ 



4.2. Exploring (G n ) nS N- I n this section, the metric distance function to closed convex sets will be used in 
order to define a sequence of loss functions (@ n )n<=N- Such sequences have already found numerous applications 
in online signal processing and machine learning tasks |53 tl54l f58, 66, 67]. under the light, however, of the 
predecessors [SHUSHES] of the present framework. In this section, this specific sequence (O n ) ne N will be 
blended with the more general class of strongly attracting quasi-nonexpansive mappings in order to construct 
Algorithm 1291 Given the wide applicability of the techniques in [56,62,63], it is natural to anticipate an even 
larger span of usage for Algorithm 1291 Such a potential will be demonstrated in Section where Algorithm [29] 
is applied to the online sparse system/signal recovery task. 

Definition 27. Assume a sequence of nonempty closed convex sets (SVi)neN- Given a user-defined q G N*, let 
the following index set 



J n := max{0, n — q + 1}, n, Vn G N. 

Notice that the sequence (j7 n ) n gN depicts a sliding window on the set N, of length at most q. 

Let us introduce a sequence of convex functions (G n : % —¥ [0, oo)) nS N inductively. For every n G N, and 
given a u n € H, define the following active index set: 

I n := \i G ^J n : u n (£ S^}. 

This set identifies those closed convex sets {Si}j g x n > out of {Sj}j^j n , which add on new "information" to our 
learning process. The sets with indexes {j G J n ■ u n G Sj} will not be processed at the time instant n. 



In the case where X n / 0, we introduce the set of weights {oj^}ii=i n C (0, 1], such that J2iei n 



(n) 



Define, now, the convex function: 



Vx G n, &n(x) :-- 



EieZ ^tS'^ d&Si), if 2^ 

0, if l n = 



(15) 



where L n := J2iei n u i d(u n , Si). We define L n := for all those n G N such that Z n = 0. 

Lemma 28. The following properties hold true for the sequence of functions (G n ) nS N given in f)15[) . 

1. For every n G N, such that I n ^ 0, we have L n > 0. 

2. For every n G N, lev<o G n = flieXn where we define P|je0 ^» := ^> *° cover a ^ so the case where T n 

3. The collection of all the subgradients of (G n ) n£ N is bounded, i.e., Vn G N, \/x G %, ||G^(x)|| < 1. 

4. For any n G N, 



0, In = 



Proof. 1. Fix arbitrarily an n G N such that X n ^ 0. By the definition of X n , Mi G T n , d(u n ,Si) > 0. Since, 
also, G (0, 1], V? G 2n, it is clear by the definition of L n that Lemma r28lll holds true. 
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2. Fix arbitrarily an n € N. Assume, first, that Z n = 0. By (fl~5j) . it is clear that lev<o O n = H =: Hie© 

Assume, now, that l n ^ 0, and that f]iex n ^ ^ ^ s cl ear by (US) that f]iex n c ^ ev <o ®n- Assume, 
now, an x ^ fliex or equivalently, 3?o £ I n such that d(x,Si ) > 0. Then, one can easily verify that 

u^ n ^ d{u S- ) 

®n(x) > — — j — ■ — —d(x, Si ) > 0. In other words, x £ lev<o 8 n , and finally lev<o n C f]iex n Notice 
that the previous arguments hold true also in the case where f]jgx n Si = 0. This establishes Lemma 128121 

3. Fix arbitrarily an n € N. By (|15p . basic calculus on subdifferentials |29U46| suggests that 

'3}, if In = t 

From now and on, we deal only with the case where X n ^ 0, since the previous equation clearly suggests 
that Lemma 128131 holds trivially in the case of X n = 0. 

By Example O the subgradient @' n (x) takes the following form: 

w ntt \ Ui n) d(u n ,Si) , ^ wj n) d(u n ,5i) , 
Vx£H, @ n (x)= > — d(x,Si)+ > — d{x,Si) 

Ljfb — lj n 

ST uj n) d(u n ,Sj) x - P Sl {x) ^ upd(un,Si) j, f 

Hence, 

VtpV No/ Mil < V ^ n) rf(«n,gi) ||g-P Si (a)|| u\ n) d(un,Si) 

^ in 

This establishes Lemma 128131 

4. Lemma 128141 is an immediate consequence of (|16p . □ 

Algorithm 29. Assume a sequence of nonempty closed convex sets (S n )neN C H. Moreover, consider a 
sequence of convex continuous functions (<J? n : % — > M) ne N> such that lev<o $n / 0, Vn S N. Associated to 
each is the relaxed subgradient projection mapping T^ n ^ (see Definition [6]) , where v n £ (0,2), Vn € N. 
For an arbitrarily chosen uq form the following sequence: 



Vn € N, ii n +i := < 



where the sequence of functions (0 ra ) ne N is given in Definition [271 @ n ( n n) ^ s an y subgradient of n at u n , and 
A n G (0,2), Vn € N. 

Lemma 1281-41 and some elementary algebra lead to the following equivalent formulation of the previous 
recursion: 

Vn G N, u n+1 = [ u n + \i n ( ^ w} n j P Si (u n ) - u n ] (17) 



n J 1 1 > 

v \iex n / / 
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where p, n := X n M n , and 

, v , E ' eI ":f )d2( ""f' ) -ipr , if E, e x„ - P Si (u n )) + 0, 

M n := { ||E ie x„4 W-Ps.K))! (18) 

otherwise. 

To avoid any ambiguity in the case where Z n = 0, we define in (fTTI) and (fl~8]) : ^ ig (P^ — u n ) := 
X^igfl — w n := 0. Notice also by the convexity of ||-|| 2 that M. n > 1, and that since A n G (0,2), we 
obtain p n G (0, 2.M n ,), i.e., the extrapolation parameter p, n is able to take values greater than or equal to 2, 
Vn G N. 

It is needless to say that the results presented in Theorem [18] hold true also for Algorithm 1291 Nevertheless, 
one can establish additional properties for Algorithm [29] based on the following assumptions. 

Assumption 30. Regarding Definition [27] and Algorithm [29] assume the following. 

1. Let Co := inf{a;j n) : i G T n + 0, n G N} > 0. 

2. The sequences (^' n (u n )) n ^ and ($' n (TQ (u n ))) n gN are bounded, i.e., there exists a D > such that 



Vn G N, max|||^K)ll , $n( r eT K)) 
3. Vn £ N, <f „ : W 4 [0, oo). See, for examp 





(An) 



e, Definition 



Theorem 31. The following statements are valid for Algorithm 1291 

1. Let Assumption 117121 hold true. Then, there exists a D > such that Vn G N, L n < D. 

2. Let Assumptions 117121 117131 and 130111 hold true. Then, lim n _ i , 00 max{d(n n , Sj) : j G J fl } = 0. 

3. If Assumptions 11712] I17I3[ [17191 and 130111 hold true, then &((u n ) n£ n) C limsup,^^ S n . Moreover, if there 
exists a u* G % such that lim n _ i . 00 u n = u*, i.e., 6((n n ) ng N) = {u*}, then n* G liminf n _ s . 0O S n . 

4. Let Assumptions EE] [T7I31 I25l2l and SUE] hold true. Then, limsup^^ <$> n (u n ) < 0. If, in addition, 
Assumption 130 131 holds true, then lim n _j. 0O 

$n(«n) = 0. 

5. The following result applies to the next section where a system/signal recovery task is considered. Assume 
Algorithm [29] for the case where T~L := R^, L G N*, equipped with the standard vector inner product. 
Assume, also, that the sequence of functions (<I> n ) ng N is given by Definition [23] Let Assumptions 11712] 
117131 125111 and 125121 hold true. Then, 6((u n ) ne ^) C lim sup,^^ [w n , p] . If there exists a u* such that 
lim^oo u n = u„, then it* G liminf^oo B h [w n , p]. 

Proof. 1. Notice, that Vn G N, Mi G J n , Vv G 

d(u n ,Si) = \\u n - Psi{u n )\\ < \\u n - v\\ + \\v - P Sl {u n )\\ < 2 \\u n -v\\ < 2 ||tt„ - u|| , 

where no := minA^, the second inequality follows from Example [7] and the third one from ([7]). Now, by 
the definition of L n , Vn G N, 

L n =^2 0J^d(u n , Si) < w | n) \\ u n - v\\ = 2 \\u no -v\\. 

Choose, now, any D > max{2 ||u no — v\\ , Lq, . . . , L no _i}, and notice that for such a D the claim holds true. 
2. Recall, here, by Definition [27] that if u n is such that X n = 0, then d(u n , Sj) = 0, Vj G J n . Obviously, this 
is equivalent to max{d(n n , Sj) : j G J n } = 0. 
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Hence, we deal only with the case of X n ^ 

@n(Un, 



For this case, we observe by (|15p that 



D 



L, 



E 



D 



> — ^ d 2 (u n ,Si) > —max{d 2 (u n ,Si) : i G l n }, 



(19) 



where the existence of D > is guaranteed by Theorem 131111 

In order to establish Theorem I18I6[ i.e., lim n ^. 0O Q n (u n ) = 0, we have used Assumption 117141 which 
imposes a bound on the sequence of subgradients (Q' n (u n )) ne ^. However, for the case at hand, Lemma [28131 
clearly suggests that boundedness holds true by default, that Assumption 1 17141 is not necessary here, and 
that Assumptions [TTl2"| 117131 are sufficient for establishing lim nH>00 Q n (u n ) = 0. Having this result hold 
true, apply lim n _ J . 00 on both sides of (fTUj) to obtain lim n _ s>00 max{d(u n , Si) : i E I n } = 0. 



Recall, now, by the definition of l n , in Definition [571 that Vj G J n \Z n , u n G Sj 44> d(u n , Sj) = 0. This 
clearly implies that max{d(u n , Si) : i G Z n } = m&x{d(u n , Sj) : j G J n }. This equality and the previously 
obtained result lim n ^ 00 m&x{d(u n , Si) : i G l n } = establish Theorem 131121 

We have already seen in Theorem 131121 that lini^^oo max{d(u n , Sj) : j G J n } = 0. Since, by definition, 
n G J n , Vn G N, the previous result implies that \im. n ^.oo d(u n , S n ) = lim n _ s . 0O — -Ps„)(ttn)|| = 0. Having 
these in mind, Theorem 13 1 131 becomes a direct consequence of Theorem [15] and Example [TBI 
Here, we will utilize Theorems 1 18151 and 118181 To this end, notice that regarding the sequence of mappings 
(2£ n )) neN) Assumption 117151 is satisfied here; indeed, notice that Vn G N, |- < 



v n — e' 



Now, Definition [2] suggests that Vn G N, (T^' 1 

(An)/ 



< *nC4* n) («»))• Notice 



that for all those n G N such that <Z>' n (T^™ > (u n )) / 0, we have 



^nK)<$n(7i A ; ) K)) + 



< 



U n )) 



^(4 A ; i) K)) 



+ 



T, 



(A») 



< 



D 



(Vn)\/rp(X n ) 



For all those n G N where & n (TQ™\u n )) = 0, we have by Definition [2J that Tq"' (u n ) G argmin^ g ^ Q n {v) 



e„ K)) 



+ D 



a,, 



T { e:\u n ) 



(An), 



and since lev<o <& n 7^ 0, we obtain Q n (TQ™\u n )) < 0. Therefore, by similar steps as previously, we obtain 



ri(An) 
6n 



(«n) 



the following inequality for such n G N: <& n (n n ) < -D 

If we apply lim sup n _ i . 00 on both sides of the previous inequalities, and if we recall Theorems 118151 and 
118181 then we obtain limsup,^^ <3? n (u n ) < 0. Notice that in the case where : H — >■ [0, oo), Vn G N, then 
the previous analysis leads to lim^—^oo <J> n (ii n ) = 0. This establishes Theorem l"3 1141 
5. First, notice that since we work in the Euclidean space M. L , &((u n )nen) = %8{(u n ) n eN)- Hence, the fact 
©((iin)neN) 7^ is guaranteed by Theorem 118141 Now, it can be verified that Theorem 131 151 is a direct 
consequence of Theorems I18I4[ 118191 and Lemma 126141 □ 



5. Application: Online Sparsity-Aware System/Signal Recovery 

The present section will demonstrate the potential of the previously introduced algorithms by devising a 
time-adaptive method for the important, nowadays, sparse system/signal recovery task. In particular, we will 
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use Algorithm [29] to derive a low-complexity and similarly effective variant of the technique introduced in 

[331501 ■ 

Sparsity is the key characteristic of systems or signals whose representation, by means of some basis in 
some domain, consists of only a few nonzero coefficients, while the majority of them retain values of negligible 
size. The exploitation of sparsity has been attracting recently an interest of exponential growth under the 
Compressive Sensing or Sampling (CS) framework |12U15] [28]. In principle, CS allows the estimation of 
sparse signals and systems using fewer measurements than those previously thought to be necessary. More 
importantly, recovery is realized by mobilizing efficient constrained minimization schemes. Indeed, it has been 
shown that sparsity is favored by l\ constrained solutions [I^HUGEICS] • 

Recall, here, that given two integers j\ < j'2, the notation j'2 stands for the set {j±,ji + 1, . . . , j'2}. Assume 
a vector := [a^i, . . . , x*,l]* i n the Euclidean space M L , L G N*, where the superscript t stands for vector 
transposition. If the support of x* is defined as supp(a:*) := {i G 1,L : x*^ 7^ 0}, and the £q norm of x* is 
defined as the cardinality of its support, i.e., ||a5*|L := # supp(£E*), by the term "sparse" a:*, we refer to the 
case where ||aj*||^ is considerably smaller than L. 

The majority of CS techniques deal with the problem of estimating a sparse system a;*, based on a number 
K{< L) of measurements (d n ) n ^ l C H that are generated by the following linear regression model (see ([I])): 

d n = a* n x* + Cn, Vn G N. (20) 

Here, (a n ) ne pj C M> L are the input vectors, which excite the unknown x*, and (Cn)neN is a real-valued discrete- 
time stochastic process which stands for the contaminating additive noise. 

A well-known batch method for estimating the sparse a;*, based on a limited number K < L of measurements, 
is provided by the Least- Absolute Shrinkage and Selection Operator (LASSO) J221E2F-' 

min{|| Ax — d\\ 2 : \\x\\g < \\x*\\ e ,x G R L }, 

where ||-|| stands for the classical Euclidean norm of a vector, H-j^ for the l\ norm, i.e., ||a3 := Ylf=i l x il> 
Vx := [rci, . . . ,x L Y G R L , d := [do, . . . jdx-lY € R K , and A G R KxL is the matrix whose rows are (a^)^ 1 . 
We stress here that the term "batch" method means that the data (otrn^n)^^) 1 have to be available prior to 
the application of LASSO. 

With only a few recent exceptions, i.e., [2|[T9 | [35 t H0 | [50] . the majority of the proposed, so far, CS techniques 
are appropriate for batch mode operation [13H16 . 25, 26J. In other words, one has to wait until a fixed and 
predefined number K of training data {a n , d n }n=Q is available prior to application of CS processing methods, 
e.g., LASSO, in order to recover the corresponding signal/system estimate. Dynamic online operation for 
updating and improving estimates, as new measurements become available, is not feasible by batch processing 
methods. The development of efficient, time- adaptive, sparsity- aware techniques is of great importance in 
engineering, especially in cases where the signal or system under consideration is time- varying and/or the 
available storage resources are limited. 

Moving along the path introduced in [2"lll9[l35, 40, 50]. the present section will deal with the case where 
x* is not only sparse but it is also allowed to be time-varying. For this reason, the number K of available 
data is allowed to take values towards 00. In this sense, the studies [2][19l[35l[l0l[50] operate in a framework 
that is different than the standard CS scenario. The major objective is no longer only the estimation of the 
sparse signal or system, based on a limited number of measurements. Letting K — > 00 in the design, the 
additional task is the capability of the estimator to track possible variations of the unknown sparse system. 
Moreover, this has to take place at an affordable computational complexity, as required by most real time 
applications, where time-adaptive estimation is of interest. Consequently, the batch sparsity- aware techniques 
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developed under the CS framework, e.g., LASSO or one of its variants, become unsuitable under time- varying 
scenarios. The focus, now, becomes the development of a framework that 1) exploits sparsity, 2) exhibits 
fast convergence to error floors that are as close as possible to those obtained by their batch counterparts, 3) 
offers good tracking performance, and 4) has low computational demands in order to meet the stringent time 
constraints that are imposed by most real time operation scenarios. Such a framework was demonstrated in 
[351 136 , 40 . 50Tf52] . Here, we focus on [35j[50]. Motivated by the previously presented Algorithm [291 we devise 
a variant of [351150] . which shows similar performance to [351150] . albeit its lower computational requirements. 

The information at our disposal is the sequence of training data (a n ,d n ) n ^, the a-priori knowledge that 
the unknown x* in fl20|) is sparse, as well as an estimate of the cardinality of the support of 
In the sequel, we will demonstrate a way to incorporate the a-priori knowledge of the estimate of ||a?*||£ in 
the design as a series of closed convex sets. 

In the spirit of Algorithm [291 we begin by introducing a sequence of closed convex sets (S n ) n ^, which 
associate to the available training data (a n , d n ) n€ N, and quantify the deviation from the adopted model of 
(|20p by the introduction of a user-defined tolerance £ > 0. 

Definition 32 (Closed hyperslab). Given the online training data (a n ,d n ) n ^ C K L x K, and a user-defined 
£ > 0, we define the following sequence of closed convex sets, called closed hyperslabs: 



The metric projection mapping Pg n can be analytically computed [541158] . it breaks down to the metric 
projection onto a hyperplane, and its computational complexity scales linearly to the number of unknowns L. 

In this section we mobilize Algorithm [29"1 where (S' n ) nG N becomes the sequence of closed hyperslabs of 
Definition [221 and (<l> n ) ng N is the sequence of sparsity-aware functions introduced in Definition [231 The 
Algorithm [291 with the metric projection mapping Pg [w„,p] used instead of T^™\ was introduced in [351150], 
The necessary complexity in order to compute the PB £ [w n ,p] 1S °f order 0(L log 2 L), needed for a sorting 
operation, and 0{L) multiplications and additions j35j|50]. In the present study, due to the utilization of the 
relaxed subgradient projection mapping 2$ in Algorithm 1291 together with the simplicity of the subgradients 
of <!>„, seen in Table [U we are able to cut down the computational complexity of the algorithm to 0{L) 
operations. As it will be made clear by the subsequent numerical experiments, the Algorithm 1291 results into 
a similar performance to its predecessor [351150] . 

The reason for introducing a series of weighted £i-balls Bg x [w n , p], instead of the standard unweighted one 
-B^fl,/}], is that 1) we have observed that the weighted ^i-balls, introduced in Definition [231 offer enhanced 
convergence speed, as also demonstrated in [161125] in a different context, and 2) the weighted balls help us 
easily incorporate the a-priori knowledge of the cardinality of the support of a;*, i.e., ||a5*||^ , in the radius p, 
as the following lemma suggests. 

Lemma 33. Assume that the sequence (u n ) n£ ^, generated by Algorithm [29l converges to the desirable cc*. 
Then, there exists an N £ Nqo such that Vp > ||aJ*||^ , Vn G N, u n 6 B^\w n ,p]. 

Proof. By definition, Ya=i Wn,i\u n ,i\ = Ya=i w™a+v Vn e N - Since lim «->°° u n = x*, 



Vn <E N, S n := {x € M. L : \d. 



'71 



aix\ < £}. 



L 



L 



L 



lim sup 




i=i 



lim sup 




i=l 



E 



+ E 



x 



< E 




i€supp(a;*) 



i^supp(a3„) 



iGsupp(a;* ) 
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The previous strict inequality and the definition of lim sup suggest that there exists an N £ Nqo such that Vn € 
N we have Yli=i w n,i\u n ,i\ < ll 33 *!!^,- I n other words, we obtain that Vn G N, Vp > ||a5*||^, w n € JB^ [to n , p] . 
This establishes Lemma [33l □ 

In other words, Lemma [33] suggests that in order to have the sequence (u n ) ng ^ converge to a*, a necessary 
condition is to set the radius p, in the weighted balls [B^ [w n , p])neN; to a value that over-estimates . 
This strategy will be followed in the subsequent numerical examples. 

5.1. Numerical examples. In this section, the performance of the proposed algorithm is evaluated for both 
time-invariant and time- varying systems. To save space, only a couple of scenarios are considered. For extensive 
experiments on the behavior of similar in spirit algorithms, the interested reader is referred to |35|I36|. 

The proposed methodology is compared to a couple of recent time-adaptive methods [2|[T9] which belong to 
the same algorithmic family; the cost function to be minimized is the sum of a quadratic loss, accounting for 
the regression model, together with an £i-norm regularization term, in order to infuse sparsity into the design. 
The method RZ-LMS [19] is built upon the classical Least Mean Squares (LMS) algorithm, and employs re- 
weighting for the regularization term. Its computational complexity scales linearly with respect to the system 
unknowns, i.e., it is of order 0(L). Re-weighting of the ^i-norm is also utilized in OCCD-TNWL [2J, where 
the quadratic regression term follows the strategy in the celebrated Recursive Least Squares (RLS) method, 
scoring an overall computational complexity of order 0(4L 2 ). 

Moreover, we mobilized batch methods for solving the classical LASSO [9jl 101159] . as well as its re- weighted 
variant [68]. In other words, for every batch method, each point in the respective curves is the outcome of a 
sub-process which takes into account all the available data available till the current time instant. It is clear 
that such an operation is infeasible in real-time implementations. Nevertheless, these performances will serve 
as benchmarks for the ^i-norm regularized least squares solvers. 

Fig. prefers to the case of a time- invariant system x*, whose length is L = 100 and only a number of 5 
coefficients, placed in arbitrary positions, are nonzero, i.e., ||:e*||^ = 5. The values of the nonzero coefficients 
were drawn from a Gaussian distribution of zero mean and variance equal to one. The input signal (a n ) ne z 
is defined as a discrete-time Gaussian process of zero mean and variance equal to 1. The vectors (a n ) ne N ? in 
(|20p. are formed as follows: Vn £ N, a n := [a n , a n _i, . . . , o n _L + i]'. The noise process (Cn)neN is Gaussian with 
zero mean and variance equal to a\ := 0.1. 

In Fig. Q3 the tag "Proposed" refers to Algorithm 1291 The curve "Proposed with exact projection mapping" 
refers to Algorithm [29l but with Pg [w n ,p] m the place of T$™\ Vn € N. This realization was introduced 
in J55JED]. For both "Proposed" and "Proposed with exact projection mapping", q was set equal to 25, 
oj^ := l/#I n , Vn G N, in the cases where l n ^ 0, p := 6, e := 0.005, and £ := 2a n . 

All of the parameters for the methods "LASSO" [9l[l0l[59], "Weighted LASSO" [68], "OCCD-TNWL" [2], 
and "RZ-LMS" [19] were tuned for producing the best respective performance for the current setting. More 
specifically, the forgetting factor for "OCCD-TNWL" [2], which is an inherent parameter in any RLS-like 
scheme, was set equal to 1. Moreover, "RZ-LMS" [19] was tuned in such a way for producing the lowest 
error floor for the iteration #450. Although different parameters for the "RZ-LMS" could result into faster 
convergence speed, this could only be obtained at the expense of higher error floors. 

Fig. Q] demonstrates that "Proposed" and "Proposed with exact projection mapping" lead to similar per- 
formances. However, due to the mobilization of T^' in "Proposed", the computational complexity drops 
to 0(qL), as opposed to 0(qL + LlogL) in "Proposed with exact projection mapping", with O(LlogL) 
accounting for sorting operations which are necessary for the computation of the exact Ps t \w n ,p]- 
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Figure 1. Time-invariant sparse system x* € 



3 100 



with Use* 



<0 



5. Here, the Mean Square 



Deviation (MSD) is defined as the following function on the number of the training data; 

2 

Vn £ N, where R is the total number of independent runs of 



MSD (n) x* - 

the experiment. Here, R := 300. 



Fig. [2] refers to the case of a time- varying system. Both the number of nonzero elements of x* and the 
values of the system's coefficients are allowed to undergo sudden changes. This is a typical scenario used in 
adaptive filtering in order to study the tracking performance of an algorithm in practice. The system used in 
the experiments is of dimension 100. The system change is realized as follows: For the first 500 time instances, 
the first 5 coefficients are set equal to 1. Then, at time instance 501 the #2 and #4 coefficients are set equal 
to zero, and all the odd coefficients from #7 to #15 are set equal to 1. Note that the sparsity level changes 
at time instance 501, and it becomes 8 instead of 5. The results are shown in Fig. [2] with the noise variance 
being set equal to a 2 a := 0.1. 

Notice also here the similarity in the performance of "Proposed" and "Proposed with exact projection 
mapping". Moreover, the "RZ-LMS" shows better tracking ability than "OCCD-TNWL" , with the forgetting 
factor set equal to 0.999. In order to raise the tracking ability of the "OCCD-TNWL", the method should be 
able to easily "forget" the remote past and concentrate on recent variations of the system. This is achieved 
by reducing the forgetting factor at the expense of an increased error floor. We chose the value of 0.96 for the 
forgetting factor of the "OCCD-TNWL" in order to achieve similar error floor to the "Proposed" method, for 
both the employed sparse systems. 
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Figure 2. Tracking performance for a time- varying sparse system ;e* € 



)100 



The sys- 



tem changes suddenly at the #501 time instant. 
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Here, as in Fig. QJ MSD(n) 



1 sr^R 

R Z^r=l 



U 



('■) 



, Vn € N, where R is the total number of independent runs of the ex- 



periment. Similarly to Fig. [H R ■= 300. 
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